Enable async offloading by default on Nvidia. #10953

comfyanonymous · 2025-11-27T22:39:41Z

Add --disable-async-offload to disable it.

If this causes OOMs that go away when you --disable-async-offload please report it.

Add --disable-async-offload to disable it. If this causes OOMs that go away when you --disable-async-offload please report it.

mohtaufiq175 · 2025-11-29T04:36:02Z

Hm… what does async offload actually do? Is it something like a VRAM saving option?

I just updated to the latest commit, and what I noticed is that my usual text2image workflow, which combines SDXL and wan 2.2 low noise, now takes almost twice as long on the first run.
I also noticed that when the job enters the wan 2.2 process, my VRAM usage only utilize to about half of the available VRAM.

In sequential runs, it slightly slower compared to having it disabled.
For example, wan 2.2 t2i normally takes around 10s/it, but now it's like 12–13s/it.

According to git grep, this is not used now, and was not used in the initial commit that introduced it (see below). This semantic is difficult to implement temporal roll VAE for (and would defeat the purpose). Rather than implement the complex if, just delete the unused feature. (venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline 220afe3 (HEAD) Initial commit. (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end: (venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master Previous HEAD position was 220afe3 Initial commit. HEAD is now at 9d8a817 Enable async offloading by default on Nvidia. (comfyanonymous#10953) (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end:

…Kandinsky) (#10995) * hunyuan upsampler: rework imports Remove the transitive import of VideoConv3d and Resnet and takes these from actual implementation source. * model: remove unused give_pre_end According to git grep, this is not used now, and was not used in the initial commit that introduced it (see below). This semantic is difficult to implement temporal roll VAE for (and would defeat the purpose). Rather than implement the complex if, just delete the unused feature. (venv) rattus@rattus-box2:~/ComfyUI$ git log --oneline 220afe3 (HEAD) Initial commit. (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end: (venv) rattus@rattus-box2:~/ComfyUI$ git co origin/master Previous HEAD position was 220afe3 Initial commit. HEAD is now at 9d8a817 Enable async offloading by default on Nvidia. (#10953) (venv) rattus@rattus-box2:~/ComfyUI$ git grep give_pre comfy/ldm/modules/diffusionmodules/model.py: resolution, z_channels, give_pre_end=False, tanh_out=False, use_linear_attn=False, comfy/ldm/modules/diffusionmodules/model.py: self.give_pre_end = give_pre_end comfy/ldm/modules/diffusionmodules/model.py: if self.give_pre_end: * move refiner VAE temporal roller to core Move the carrying conv op to the common VAE code and give it a better name. Roll the carry implementation logic for Resnet into the base class and scrap the Hunyuan specific subclass. * model: Add temporal roll to main VAE decoder If there are no attention layers, its a standard resnet and VideoConv3d is asked for, substitute in the temporal rolloing VAE algorithm. This reduces VAE usage by the temporal dimension (can be huge VRAM savings). * model: Add temporal roll to main VAE encoder If there are no attention layers, its a standard resnet and VideoConv3d is asked for, substitute in the temporal rolling VAE algorithm. This reduces VAE usage by the temporal dimension (can be huge VRAM savings).

Enable async offloading by default on Nvidia.

8be1b8e

Add --disable-async-offload to disable it. If this causes OOMs that go away when you --disable-async-offload please report it.

comfyanonymous requested a review from Kosinkadink as a code owner November 27, 2025 22:39

comfyanonymous merged commit 9d8a817 into master Nov 27, 2025
12 checks passed

comfyanonymous deleted the temp_pr branch November 27, 2025 22:46

nota-rudveld mentioned this pull request Nov 29, 2025

ComfyUI has added async weight offloading by default, not compatible with older GPUs patientx/ComfyUI-Zluda#382

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable async offloading by default on Nvidia. #10953

Enable async offloading by default on Nvidia. #10953

Uh oh!

comfyanonymous commented Nov 27, 2025

Uh oh!

Uh oh!

mohtaufiq175 commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enable async offloading by default on Nvidia. #10953

Enable async offloading by default on Nvidia. #10953

Uh oh!

Conversation

comfyanonymous commented Nov 27, 2025

Uh oh!

Uh oh!

mohtaufiq175 commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants